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Abstract 

Type IV pili are surface-exposed filaments and bacterial virulence factors, represented by the Tfpa and Tfpb types, which 
assemble via specific machineries. The Tfpb group is further divided into seven variants, linked to heterogeneity in the 
assembly machineries. Here we focus on Pil02 Bp , a protein component of the Tfpb R64 thin pilus variant assembly 
machinery from the pathogen Burkholderia pseudomallei. Pil02 Bp belongs to the PF06864 Pfam family, for which an 
improved definition is presented based on newly derived Hidden Markov Model (HMM) profiles. The 3D structure of the N- 
terminal domain of Pil02 Bp (N-Pil02 Bp ), here reported, is the first structural representative of the PF06864 family. N-Pil02 Bp 
presents an actin-like ATPase fold that is shown to be present in BfpC, a different variant assembly protein; the new HMM 
profiles classify BfpC as a PF06864 member. Our results provide structural insight into the PF06864 family and on the Type IV 
pili assembly machinery. 
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Introduction 

Gram-negative bacteria virulence is linked to the production of 
different factors, including Type III secretion systems, flagella, 
capsule, lipopolysaccharide and Type IV pili (Tfp) [1]. Tfp are the 
prevalent members of the pili family, building hair-like appendages 
found on the surface of many bacteria. They participate in 
adhesion, cell-to-cell interactions, auto-aggregation, biofilm devel- 
opment, DNA exchange and motility [2]. Tfp consist of 
oligomerized pilin subunits assembled by a Tfp assembly 
machinery, which shares a common ancestor with the Type II 
secretion system. Tfp are divided into two types, Type IVa (Tfpa) 
and Type IVb (Tfpb), based on signal peptide length and the size 
of their pilin subunits [3] . The study of Tfpb assembly machinery 
has been pursued to a lesser extent than the Tfpa assembly 
machinery, since it was held to be very similar to the latter; such a 
belief was disproved by the identification of Tfpb assembly 
machinery variants [4,5]. To date, seven different variants of Tfpb 
assembly machineries have been identified in Gram-negative 
bacteria: ij the Bundle-forming pilus (BFP) from enteropathogenic 
Escherichia coli [6]; ii) the R64 thin pilus [5,7] of enteropathogenic 
E. coli, together with other R64-related pili from P.seudomonas 
aeruginosa [8] and Salmonella enterica [9]; Hi) the longus (lng) pilus 



from enterotoxigenic E. coli [10]; iv) the Cof or CFA/III pilus of E. 
coli [1 1]; v) the toxin co-regulated pilus (TCP) of Vibrio cholerae [12]; 
vi) the Cpa pilus of Caulobacter crescentus [13]; and, mi) the fibril- 
associated protein (Flp) or tight adherence (Tad) pilus of 
Aggregatibacter actinomycetemcomitans [14]. 

All components of the Tfpb assembly machinery have now been 
identified; some (the core proteins) are common to all Tfp systems, 
while others are specific to each machinery. Little is known about 
the interactions of such specific proteins in the context of pilus 
biogenesis, and their functions remain unknown. The global 
architecture of the Tfpb assembly apparatus is characterized by 
two sub-assemblies present in the outer (OM) and the inner 
membrane (IM), respectively, spanning the bacterial envelope. 
Only two 3D-structures of components of the Tfpb-specific 
assembly apparatus are known, both being putative stabilizers of 
the IM complex; these are the N-terminal domain of BfpC [15], 
and the TadZ protein [16]. 

A previous study by Essex- Lopresti et al. [17] identified eight 
Tfp-associated loci in the genome of B. pseudomallei K96243, a 
pathogenic Gram-negative bacterium responsible for melioidosis, 
an often fatal infectious disease that is endemic in tropical areas, 
particularly in Thailand and northern Australia [18]. Among 
these, the Tfp7 locus was recognized as a putative operon/regulon 
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containing nine genes coding for proteins BPSS1593 to 
BPSS1601. Tfp7 is of particular interest as it encodes for a Tfpb 
that is closely related to all members of the R64 thin-pilus variant. 
The operons of this variant share, along with Tfp7, genes encoding 
proteins belonging to the same Pfam [19] families, and particularly 
a specific protein, BPSS 1 599, of unknown function. 

Pfam analysis of BPSS 1599, a predicted accessory Pilus 
assembly protein 2 (Pil02B P ) revealed an inconsistency between 
the HMM profile of the PF06864 Pfam family and the proposed 
Pil02B P sequence alignment. Based on sequence searches, we 
present three new HMM profiles that allow a full alignment of 
Pil02Bp with its Pfam family. Moreover, we report the 1.55 A 
crystal structure of the N-terminal domain of Pil02B P (residues 1- 
192; N-Pil02B P ), solved by S-SAD phasing, providing a first 
glimpse of the 3D structural properties of the PF06864 family, 
previously lacking a structure representative. We show that N- 
Pil02B P consists of two ot/p sub-domains separated by a 
prominent central cleft, typical of actin-like ATPases. Additionally, 
in line with the description of P1102b p as a putative Tfpb assembly 
machinery protein, N-Pil02B P is found to be homologous to N- 
BfpC, another cytoplasmic protein involved in a Tfpb assembly 
system. Interestingly, BfpC is a specific protein of the Bundle- 
forming pilus variant, which differs from the R64 thin-pilus variant 
to which Pil02B P belongs. Both structures were found to be very 
similar, despite lack of any significant sequence identity, and 
despite the fact that BfpC had not been initially recognized as 
PF06864 member. Our improved HMM profiles and structural 
data, taken together, update and improve sequence alignments of 
the PF06864 family, and shed light on protein 3D structures in the 
Tfpb assembly machinery. 

Materials and Methods 

Cloning, purification and crystallization 

The 5' end of the BPSS 1599 gene (NGBI accession number 
YP_1 11607.1), coding for N-Pil02 Bp , amino acids 1-192, was 
amplified by PCR from genomic DNA (Prof. TitbalPs group, 
University of Exeter, UK) from B. pseudomalki strain K96243 using 
the primers Pil02-Fl (5'-CACCATGAGCGCGCAGGTG-3') 
and Pil02-Rl (5'-CTACGACAGACGCCGCTCG-3') for inser- 
tion into the pET151/TOPO vector (Life Technologies). The 
same protocol was applied to the 3' end of the BPSS1599 gene, 
coding for amino acids 221 to 432. Successful cloning and PCR 
fidelity were confirmed by sequencing (BMR Genomics Sri., 
Padova). N-Pil02B P and C-Pil02B P domains were expressed as N- 
terminal His-tag fusion proteins in C41 (DE3) E. coli cells in Luria- 
Bertani broth, inducing with 0.5 mil IPTG at 18°C overnight. 
Bacterial cells from a 1 L culture were harvested and lysed in 
Buffer A (300 mM KC1, 5 mM imidazole, 50 mM KH 2 P0 4 
pH 8), containing lysozyme (0.25 mg/ml), DNases (20 ug/ml) and 
10 mM MgCl 2 . Following sonication and centrifugation, the 
protein of interest was purified using an automated purification 
protocol (BIORAD Profinia system). To this aim, the soluble 
fraction was loaded onto a 5 ml Bio-Scale Mini Profinity IMAC 
cartridge, pre-equilibrated with Buffer A. The protein was then 
eluted using the standard native IMAC protocol available on the 
Profinia system, with the addition of 20% glycerol to the elution 
buffer. Pure fractions, as judged by SDS-PAGE, were pooled and 
concentrated. The His-tag was removed using the AcTEV 
protease™ (Life Technologies), incubating 10 mg protein with 
55 mI of the protease (10 U/|0.1) overnight, at room temperature 
with mild agitation, according to manufacturer's instructions. The 
His-tag and the protease were removed using the same Profinia 
system and a 5 ml Bio-Scale Mini Profinity IMAC cartridge. The 



fraction containing cleaved protein was exchanged into 10 mM 
Tris-HCl, pH 8 and 20% glycerol and concentrated to 8 mg/ml 
for crystallization trials. N-Pil02B p crystals containing phosphate 
were grown in sitting drops at 20°C, in 300 nl droplets containing 
50% protein (8 mg/ml) and 50% reservoir solution (1.3 M 
sodium-potassium phosphate buffer pH 7.8), using an Orxy8 
robot (Douglas Instruments). Crystals grown in the absence of 
phosphate were obtained from 300 nl sitting drops grown at 20°C, 
containing 50% protein solution (6 mg/ml) and 50% reservoir 
solution (0.9 M sodium-potassium phosphate buffer pH 7.0). 
Crystals were cryo-protected in a solution containing the 
appropriate buffer and 15% glycerol. 

Generation and validation of the HMM profiles for Pfam 
family PF06864 

A sequential strategy was applied to improve the HMM profiles 
for the PF06864 family. First, the full sequences of the PF06864 
RP15 group members, including Pil02B P , were realigned against 
the original PF06864 HMM profile using the 'hmmalign' tool 
from the HMMR3 package (Alignment 3). Residues comprising 
alignment positions 1 to 1 70 in every sequence were then extracted 
from Alignment 3 for independent alignment. Alignment position 
171 contains Pil02B P Ala93, the first residue aligned with the 
PF06864 profile. The isolated N-terminal sequences were then 
used as input for multiple alignment with 'T-Coffee' [20], run in 
three modes: default parameters (Alignment 4A), accurate mode 
using the EBI psi-blast client (Alignment 4B) and accurate mode 
using the NCBI blastp client (Alignment 4C). Alignments 4A, B 
and C were merged with Alignment 3 by substitution of the first 
170 residue positions in the latter, thus generating three new 
alignments, with complete sequences, for the PF06864 RP15 
group (Alignments 5A-C). These alignments were used to 
generate three new HMM profiles (Default HMM profile, EBI 
HMM profile and NCBI HMM profile) with the 'hmmbuild' tool 
of HMMR3. In order to validate the new HMM profiles, the 
original PF06864 seed sequences were aligned against them, 
obtaining three new multiple alignments (Alignments 8A-C). 
Alignments 8A-C were then compared to the PF06864 seed 
multiple alignment using the T-Coffee 'profile vs profile' function, 
resulting in a score of 98 (out of 100) using the Default and NCBI 
HMM profiles (Alignments 9 A, B and C, respectively) and a score 
of 97 using the EBI HMM profile (Alignment 9B). As a reference, 
a score of 99 is obtained when comparing Pfam's PF06864 seed 
multiple alignment against itself (Alignment 9D). Finally, the 
alignment of Pil02 Bp and of the rest of the PF06864 RP15 group 
members (Alignments 6A-C) with the new profiles was analyzed. 
As expected, the new profiles now cover all the sequence, in 
contrast to Alignment 1, which excludes the first 92 residues. 

X-ray diffraction Data Collection, Structure Determination 
and Refinement 

Successful S-SAD phasing often relies on the presence of 
additional/unexpected weak anomalous scattering species (a 
phosphate ion in our case) in addition to the protein Met/Cys 
sulphur atoms. The method requires a long wavelength for the 
incident X-ray beam in order to maximize the f " anomalous 
contribution of the S atoms. When using long wavelength (lower 
X-ray energy) for data collection, the harmonic contaminations of 
the X-rays affect the anomalous signal. Suppression or reduction 
of higher harmonic contamination in the primary X-ray beam is 
an essential precondition for success of an S-SAD experiment. All 
diffraction datasets for N-Pu02b p (containing phosphate) were 
collected on the BM14 beam line, at the European Synchrotron 
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Radiation Facility (ESRF, France) using a MAR 225 CCD 
detector on a single good quality tetragonal crystal (Space group 
P4 1 2 1 2 ! unit cell edges a = b = 56.0 A; c= 117.0 A; 
ot — fj = y = 90.0°). The 1 .55 A resolution data set was collected 
at beam energy of 12.7 keV, and treated as a native dataset. The 
sulfur-SAD data sets were collected at 7 keV (X = 1.7712 A) by 
exploiting the goniostat k geometry (k = 0°, k = 35° and k = 70°), 
to limit systematic errors associated with X-ray absorption or 
radiation damage, and to achieve high multiplicity within the 
collected data. The harmonic contamination from 2 1 keV (for the 
7 keV set) was reduced by offsetting the second crystal of the beam 
monochromator using the pusher value -0.25. Datasets were 
integrated with the program HKL2000, and scaled with 
SCALEPACK [21]. Data collection statistics are shown in 
Table 1. Attempts at S-SAD phasing were successful when 
employing the merged K = 0° and K = 35° S-SAD datasets, 
measured to 1.9 A resolution. The ShelxC/D/E programs 
embedded in the HKL2MAP application was used to analyze 
the heavy-atom substructure of the datasets (ShelxC), locate the 
anomalous scatterers (ShelxD), and extract phase information 
(ShelxE) [22]. About 100 trial runs of ShelxD were performed to 
find the correct positions of the anomalous scatterers. All four 
expected S atoms were located along with an additional peak for a 
phosphor atom. The 'heavy atoms' were subjected to 20 cycles of 
phase refinement in ShelxE, and three cycles of model tracing, 
while extending data resolution, using free lunch algorithm in 
ShelxE. The experimental phases and the linked model were 
employed for automated model building in Phenix - AutoBuild 
program [23-27], which allowed to build a model consisting of 
1 94 residues, with an overall model/ map correlation coefficient of 
0.884. Inspection of the map confirmed the presence of one 
phosphate ion, and of two additional residues at the N-terminal 
(residues belonging to the vector used). Several rounds of manual 
model building with COOT, and refinement with the program 
REFMAC5, were carried over [26,28,29]. The structure was 
refined to R crys , = 0.185, Rf re( . = 0.223 values, and the quality of 
the model checked with PROCHECK [30]. The final refinement 
statistics and quality parameters are shown in Table 2. The 
diffraction dataset, for Pil02r5p devoid of phosphate, was collected 
on ID23-1 beam line, at the European Synchrotron Facility 
(ESRF, France), on a single good quality tetragonal crystal (Space 
group P4 1 2 1 2, unit cell edges a = 4=52.7 A; c= 127.0 A; 
a = P = y = 90.0°). The data were integrated with the program 
iMOSFLM and scaled with SCALA; data collection statistics are 
reported in Table 1. Phases were obtained using the N-Pil02B p 
bound to phosphate structure by molecular replacement. The 
structure was further completed manually and refined by cycling 
between COOT and REFMAC5 programs, refined to 
Royst = 0. 194, Rg-ee = 0.250, and the quality of the model checked 
with PROCHECK. The final refinement statistics and quality 
parameters are shown in Table 1. The atomic coordinates and 
structure factors for N-Pu02b p with phosphate, and without the 
bound phosphate, were deposited in the RCSB Protein Data Bank 
under accession codes 4BYZ and 4BZ0, respectively [31]. 

Results 

Pil02 Bp is a component of the Tfpb R64 thin pilus variant 

Pil02sp is described as belonging to the PF06864 Pfam family, 
which consists of several enterobacterial specific PilO proteins; 
Pil02B p is annotated as a specific structural part of a Tfpb 
assembly apparatus. Notably, two PilO Pfam families have been 
described and documented in a recent review [32]. The first PilO 
family, PF04350, hosts proteins of the Tfpa assembly machinery 



(no homologs to members of this family have been detected in B. 
pseudomalki K96243). The second PilO family, PF06864, includes 
proteins involved in the Tfpb R64 thin pilus variant, such as 
Pil02B P - To ease nomenclature, in the following we will address 
the PF06864 family as Pil02, and add the suffix 2 to aU 
components of the Tfpb R64 thin pilus variant (Pil). 

In support of the Pfam family description, we noted that, 
according to the Burkholderia Genome Database [33], Pu02b p 
pertains to a predicted operon (positioned from 2167547 to 
2177170 bp on chromosome 2 of B. pseudomalki K96243), 
composed of nine open reading frames coding for putative Pil 
proteins (PilV2, M2, S2, R2, Q2, P2, 02, N2) and for the TcpQ2 
protein, with an estimated length of 9623 bp. Downstream to this 
operon, a pilT2 gene codes for the PilT2 protein, an additional 
component of the Tfp7 assembly machinery (Fig. 1). The 
organization of Tfp7 is well conserved in all Burkholderia strains, 
suggesting a very ancient origin for this genetic element. Although 
the genetic organization is not conserved among the Tfpb R64 
thin pilus variant operons, all genes are orthologs (Fig. 1). The R64 
operon includes 14 genes (pilI2, J2, K2, L2, M2, M2, 02, P2, 02, 
R2, S2, T2, U2, V2) [5,7], and was originally described in S. enterica 
serovar Typhimurium; the second operon reported in S. enterica, 
serovar Typhi and serovar Dublin [34] , in plasmid pSERB 1 from 
Enteroaggregative E. coli [35] and in T. pseudotuberculosis [36] lack 
the pilI2, J2, K2 and T2 genes. The organization of the pil operon 
in PAPI-1 from P. aeruginosa is somehow different since the pilM2 
gene is at the end of the operon and pilU2 is absent, the final 
operon being structured as follows: pilL2N202P2Q2R2S2T2V2M2. 
Regarding the Tfp7 operon of B. pseudomalki, it is also reorganized 
as follows: pilV2M2S2R2Q2P202M2, plus TcpQJ; pilT2 is found as 
single gene downstream of the Tfp 7 operon. Thus PilL2 is missing, 
although it could be replaced by TcpQ2, which has similar 
function. PilU2 is distantly related to PilD a component of the 
Tfpa, involved in the processing of prepilin protein PilA, the major 
subunit pili. As for PAPI- 1 , pilU2 is absent from the Tfp7 operon of 
B. pseudomalki. In the case of R64, pilU2 encodes for a prepilin 
peptidase that cleaves PilS2, whereas in P. aeruginosa PilD from its 
Tfpa machinery cleaves PilS2. In B. pseudomalki, where six other 
Tfp's are present, we suggest that one of the prepilin peptidases 
from a different machinery may process Tfp7 PilS2. The pil02 
gene is present in all the Tfpb R64 thin pilus variant operons, 
however its sequence has evolved divergently, leading to reduced 
sequence homology between pil02Bp and the other pil02 
components. 

Pil02 Bp within the PF06864 Pfam family 

As already mentioned, based on the Pfam sequence-search tool 
Pil02 Bp (UniProt [37] code Q63JW5) was assigned to family 
PF06864. In fact, Pil02 Bp is part of the alignment of the proteins 
from the 15% representative proteomes (RP15) [38] for this family 
(Alignment 1). However, alignment of Pu02b p to the PF06864 
profile begins at Ala93, thus excluding the first 92 residues and 
casting doubts on whether Pil02B p is correctly assigned only as a 
Pil02 protein. Pfam reports residues 12 to 92 as belonging to the 
'envelope', however they are out of the alignment and a huge gap 
is introduced in their place. After careful examination of the 
PF06864 RP 1 5 it was possible to manually align a short motif in 
the N-terminal region of these proteins (Alignment 2). This 
suggested that the first 92 Pu02b p residues may in fact belong to 
the Pil02 profile. In order to confirm this hypothesis, a sequential 
strategy, schematized in Fig. 2, was applied. To see whether the 
first positions in the family profile may be improved, new HMM 
profiles were generated. To this end, the program T-Coffee [20] 
was run in three different modes: default parameters, accurate 
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Table 1. Crystallographic data-collection statistics. 





X-ray source 


ESRF BM14 


ESRF BM14 


ESRF ID23-1 


Data 


S-SAD 


Native + P0 4 


Native 


Wavelength (A) 


1.7712 


0.97872 


1 .06890 


Space Group 


P4,2,2 


P4,2,2 


P4,2,2 


Cell parameters (A, °) 


a = b = 56.0; 


a = b = 56.0; 


a = b = 52.73; 




c = 117 


c = 117 


c=127 




a = p = y = 90 


a=P = y = 90 


a = p = y = 90 


No. of molecules in an asymmetric unit 1 1 1 


Resolution range (A) 


50-1.9 (1.9-1.93) 


50-1.55 (1.55-1.58) 


42.33-1.76 (1.76-1.86) 


Total Reflections 


544748 


287763 


103017 


Unique Reflections 


18140 


27940 


16362 


Completeness (%) a ' b 


100 (100) 


99.9 (100) 


88 (99.1) 


Redundancy 11 


30.0 (28.8) 


10.3 (10.1) 


6.3 (5.9) 


Mean l/a (l) b 


89.1 (2.92) 


35.9 (2.95) 


1 6.6 (5.0) 


Rm e rg e (%) bC 


6.0 (16.7) 


5.7 (68.3) 


5.8 (22.1) 


Phasing" 


ShelxD Data used (A) 


2.7 






Correlation coefficient (CC) e ShelxD CC aN /CC weak 


37.83/23.65 






ShelxE - Figure of merit (FOM) f 


0.709 







a Data completeness treats Bijvoet mates independently. 

Statistics for the highest resolution shells are given in parentheses. 

\l(hkl), - < l(hkl) >|/J> w Xj< Khkl), >. 
"Substructure determination parameters are from ShelxD. 

e CC = fcwEoEgw - Y 1 w£o'LwE c ]/{['LwE 0 2 ^w - (£>E 0 ) 2 ] [JwE c 2 Jw -(£wE c f]} u2 , 

where w is weight. CC a n/CC WMk is the correlation coefficient for all and weak reflections of the best solution. 



mode using the EBI psi-blast client, and accurate mode using the 
NCBI blastp client. The three new HMM profiles and all 
generated alignments are available for download at http:// 
bioinf.uab.cat/newPF06864hmmprof/. Three complete align- 
ments of the full-length protein, which we shall call Default 
HMM profile, EBI HMM profile and NCBI HMM profile, 
respectively, were then obtained and validated for Pil02B p 
(Alignments 10 A, B and C). The three profiles produced almost 
identical results, showing mismatches only in 4 out of the 432 
sequence positions. Based on these results, we suggest that the 
PF06864 Pfam family should be assigned an improved HMM 
profile. 

Updated HMM profiles identify new members of the 
PF06864 family 

The PF06864 members are described as pilin accessory protein 
(PAP_PilO) in the Pfam database. The family is composed of 257 
protein members, with proteins A3JI41 and Q6EVW5 (UniProt 
Id) having the lowest and highest scores, respectively. Six of these 
proteins have been however removed from UniProt, therefore 
reducing the actual set to 251 members. In search for new 
members of PF06864, we first scanned the whole uniprot_ 
trembl_bacteria database (http://www.uniprot.org) with the 
original Pfam HMM profile. Using the hmmsearch program from 
HMMR3 [39], 391 matches were found with significance above 
the default threshold, including the 251 proteins reported in Pfam 
for PF06864 and with A3JI41 as the member with lowest score. All 
391 sequences are annotated in UniProt as belonging to the 
PF06864 family. 



f FOM, figure of merit = | F(M/)best|/|F(Wt/)|; F(hM)best = £P(a)F hk ,(a)/5>(ci). 
doi:1 0.1 371 /journal.pone.0094981 .t001 



Table 2. Refinement and Ramachandran plot statistics. 







Native + P0 4 


Native 


Resolution range (A) 


1.58- 1.55 


1.86 - 1.76 


Reflections used for refinement (all) 


26448 


15597 


Reflections used for /f frae 


1438 


851 


Rcys,(%) a 


18.6 


19.4 


Rfree (%) 


22.3 


25.0 


RMSD bond lengths (A) 


0.008 


0.009 


RMSD bond angles (°) 


1.548 


1.372 


fi-factors (A 2 ) 


Protein 


19.2 


29.8 


Water 


30.5 


36.4 


Phosphate ion 


23.4 




Potassium ions 




30.1 


Ramachandran Favored region (%) 


92.6 


94.8 


Additional allowed region (%) 


7.4 


4.6 


Generously allowed regions (%) 


0.0 


0.0 


Outliers' 3 (%) 


0.0 


0.65 



"fico-s, =Yj,ki\\Fo(hkD\-k\F c (hkD\\/^ hkl \F 0 (hkl)\, where F„ and F c are observed and 
calculated structure factors. 

b The outlier in the native structure corresponds to conformer A of Arg10, 
present in a flexible loop region of the structure. 
doi:1 0.1 371 /journal.pone.0094981 .t002 
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pilR > pilF } S159*) pirV )| S1S92 > 




Se pil operon 



pilQ p^lR > pilS j> pilT pMU 



pilM> pilNa >— >> PilO > pilP ] 




If 



Figure 1. Comparison of Tfpb machinery R64 thin pilus variant encoding operons for different microorganisms. The alignment was 
performed using tblastx from the Blast suite, and visualized in Artemis Comparison Tool. Conserved protein regions are paired by color-shaded 
regions; the blue and red colors represent the reverse and forward matches, respectively, and color intensity is proportional to the sequence 
homology. Genes are represented by arrows; the same arrow color indicates putative orthologs. The grey arrows represent genes lacking homologs 
among represented pil clusters. The pil cluster sequences were retrieved from GenBank: Tfp7 locus from B. pseudomallei {Bp) chromosome 2 complete 
sequence, BX571966.1; PAPI-1 pil gene cluster from P. aeruginosa {Pa) PA14, AY273869.1; R64 transfer region, AB027308.1; and pil operon from 
Salmonella enterica (Se) subsp. enterica serovar Paratyphi C strain CN13/87, AY249242.1. 
doi:1 0.1 371 /journal.pone.0094981 .g001 



We then scanned the uniprot_trembl_bacteria database using 
our three new HMM profiles. Out of the 25 1 PF06864 members, 
250 scored over the significance threshold. The missing sequence 
(UniProt Id G1ZDN7) yielded also a positive match when 
manually aligned against the profiles using the hmmalign program 
from HMMR3 (204 of the 246 residues in the sequence align using 



the original Pfam profile, while 184 residues align based on the 
new Default and NCBI HMM profiles, and 183 with the EBI 
profile). In addition, and notably, one hmmsearch identified 182 
supplementary sequences matching the EBI HMM profile, two of 
which (G4FYD9 and K2RGC3) have been recently removed from 
UniProt, and 179 sequences matching the Default and NCBI 



Alignment 1: PF06864 RP15 group (without N-terminal sequences) 

^ Manual alignment 

Alignment 2: Common N-terminal motif 

Full sequences of RP1 5 group | hmmalign 

+ Original PF06864 y 

Alignment 3: full sequences of PF06864 RP15 group with N-termini unaligned 



Isolated N-terminal sequences not aligned with PF06864 profile 

T-Coffee 



1 



Accurate mode using EBI psi-blast 
Alignment 4B 



Alignment 5B 



Default parameters 
Alignment 4A 



Accurate mode using NCBI blastp 
Alignment 4C 



1 



Merged with Alignment 3 



Alignment 5A Alignment 5C 

hmmbuild 



1 



1 



EBI HMM profile Default HMM profile NCBI HMM profile 

| Full sequences of RP15 group ^ hmmalign j 



Alignment 6B 



Alignment 6A 



Alignment 6C 



Figure 2. Schematic view of the sequential strategy applied to generate PF06864 Pfam family improved HMM profiles. See the main 
text and Supplementary Fig. S1. for alignment coding. 
doi:1 0.1 371 /journal.pone.0094981. g002 
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HMM profiles. The latter 1 79 sequences are fully contained in the 
EBI-profile-matching set of 180. Out of the 180 additional 
proteins, 137 are already annotated in UniProt as members of the 
PF06864 family. Of the remaining 43 proteins, eight are 
annotated as PilO or pili-related proteins, two as ATPase, two 
as BfpC, and 31 as putative or uncharacterized proteins (Table 
SI). A search in the Swiss-Prot database for proteins matching the 
three new profiles showed that the majority of matches belong to 
plant or animal bacterial pathogens, with the remainder being 
symbionts. Remarkably, similar results were obtained for the 43 
newly-identified putative members of the family. 

S-SAD crystal structure analysis of Pil02 Bp 

The function of Pil02 within the Tfpb assembly machinery is 
not documented, however, Pil02 is reported to be an assembly 
protein that localizes in the cytoplasm in the absence of other Pil 
proteins, translocating to the OM in their presence [5] . Given the 
absence of a signal peptide, the authors suggest that this 
unexpected intracellular localization is in fact due to complex 
formation with OM-localized proteins and they propose further 
investigations into the matter [5]. In order to study the 3D 
structure of Pil02B P , we first had to define the best strategy to 
produce the recombinant protein in stable and soluble form. To 
verify the possible IM localization of Pu02b p , diverse prediction 
packages were used. SignalP 4. 1 [40] predicted the absence of a 
signal peptide, supporting the hypothesis of cytoplasmic localiza- 
tion. Predicting Pil02B P topology proved more demanding, due to 
incoherent results produced by several prediction programs 
utilized. FFPred, from the PSIPRED website [41], predicted the 
most plausible topology, with a cytoplasmic localization for the N- 
terminal domain (1-194), a transmembrane domain (195-214), and 
a periplasmic localization for the C-terminal domain (215-432). 
Based on the information emerging from the above predictive 
approach, an N-terminal domain construct (residues 1-192: N- 
Pil02B P ) was designed. Following expression and purification of 
the recombinant N-Pil02B P , the 194-residue (the first two N- 
terminal residues are from the vector) protein was crystallized 
using the sitting drop vapor diffusion method. The tetragonal 
crystals, grown from sodium/potassium phosphate solutions at 
pH 7-8, proved of excellent diffraction quality (see Materials and 
Methods and Table 1). 

Due to the lack of suitable structure homologs in the protein 
data bank (PDB), the use of molecular replacement to solve N- 
Pil02B p 3D structure was prevented. Thus, considering the 
availability of 1.5 A resolution data for this protein, the single- 
wavelength anomalous diffraction (SAD) phasing method was 
adopted based on the four intrinsic sulphur atoms present in native 
N-PU02b p . S-SAD data collection and phasing was conducted at 
the BM14 diffraction beam line at the ESRF (Grenoble, France), 
using 1.77 A X-ray source wavelength. Four sulphur and one 
phosphor anomalous scatterers were located during the phasing 
procedure. The N-Pu02b p 3D structure was then refined using 
data at 1.55 A resolution, to R-free and R-factor values of 0.186 
and 0.223, respectively (Tables 1-2; for further information see 
Materials and Methods). 

N-Pil02 Bp 3D structure 

N-P1102b p crystals contain one protein chain (194 residues) per 
asymmetric unit, structured into two similar sub-domains, each 
displaying ot/p topology, separated by a cleft (Fig. 3 A & 3B). This 
fold, according to the SCOP nomenclature, is typical of proteins 
belonging to the actin-like ATPase domain superfamily [42]. 
Indeed, this particular structure results from the duplication of the 
ribonuclease H-like motif, which consists of three layers (a/p/a), 



hosting a mixed 5-stranded P-sheet. Such features are conserved in 
N-Pil02 Bp (Fig. 3A & 3B). The protein is divided into two sub- 
domains. Sub-domain 1 comprises 7 P-strands (Pl-6 and P16), one 
3 io helix and 2 a-helices (al-a2), comprising mainly N-terminal 
residues, except for a P-strand 16 formed by G-terminal residues 
185-187. As described for the ribonuclease H-like motif, P5 is anti- 
parallel to P3 and P4 in the sub-domain 1 P-sheet. As for sub- 
domain 1, sub-domain 2 commences with a P-hairpin (p7-p8) that 
extends along the lower back of the protein (Fig. 3A & 3B). 
Subdomain 2 is composed of 7 P-strands (five of which form a P- 
sheet (P9-P13), 2 a-helices and two 3io helices. From pi3, the 
polypeptide forms an irregular loop that wraps around the back of 
both sub-domains to finish at the side of sub-domain 1 . The two 
3 io helices are present in this extended loop alongside (314. The 
cleft at the front of the protein separates sub-domains 1 and 2 
(Fig. 3A). The peripheral P-strands (P3 and pi 1) of the two main 
P-sheets run antiparallel to each other, and form the walls/floor of 
the cleft. In full length Pil02 Bp , Serl92 at the C-terminus of N- 
Pil02B P is followed by three residues (Pro-Arg-Ala), and then by 
the putative transmembrane segment 195-215. 

N-Pil02 Bp intermolecular association and biological unit 

Although the crystal structure of N-Pil02B P displays one protein 
chain per asymmetric unit, inspection of crystal packing highlights 
three differendy packed 'dimers' (Fig. 3C). The first 'dimeric' 
interface, results from the interaction of two N-PH02b p molecules 
via a phosphate ion. As explained in the following paragraph, the 
phosphate ion (that was located through its anomalous scattering 
signal) could mimic a phospholipid head that may be the true 
molecular partner recognized by this protein region (res 42-43, 60- 
61 and 186-192) (Fig. 3D). The second crystal packing dimer is 
built around an 'anchor loop' that covers residues 168-179. This 
loop was refined with higher than average B-factors, suggesting 
conformational flexibility that may mediate Pil02B P interaction/ 
recognition with other assembly machinery partners. In this 
respect, we found that a native diffraction data set, independendy 
collected on the ESRF beamline ID23-1 from a crystal grown 
under lower phosphate concentration and at a different pH value, 
produced an N-PH02b p model lacking the mentioned phosphate 
ion, and presented a larger unit cell (about 10 A on the c edge). 
Due to even higher flexibility, in the absence of phosphate it was 
not possible to model the anchor loop structure into continuous 
electron density. The conformational flexibility of the loop thus 
appears to be dependent on intermolecular interactions. In fact, in 
the absence of the phosphate ion the two protein molecules (paired 
through crystal packing) move apart, in keeping with the increased 
unit cell size. The third N-Pil02B p dimer considered presents a 
wider interface (946 A 2 ) that might be biologically relevant. 
However, analysis through the PISA server at the European 
Bioinformatics Institute [43] showed that, despite with a 30- 
residue interface hosting six hydrogen bonds and six salt bridges, 
the probability for this homodimerisation interface to be 
biologically relevant is low (A'G P-value, 0.303; AG, -4.1 kcal/ 
mol; Complexation Significance score (CSS), 0). 

The distribution of electrostatic charges shows that two N- 
Pil02B p regions are composed of basic residues, with the pocket 
hosting the phosphate ion and the 'anchor loop' (Fig. 3E). 
Considering the location of the expected transmembrane segment 
that follows N-Pil02B p C-terminal residues, such positively 
charged surface may help the protein interact with a phosphate 
from IM phospholipid head groups. Indeed, the identified 
phosphate ion is located in a pocket in the C-terminal region, 
interacting with Arg61, Argl89 and Asp42. In full-length Pli02 Bp , 
such a pocket may face the membrane and fall in its close 
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Figure 3. N-Pil02 Bp protein. A. Overall fold of N-Pil02 Bp composed of two ot/fi topology subdomains, each displaying a mixed (3-sheet, separated 
by a (central) cleft. The bound phosphate ion is shown as spheres. B. Topology diagram of N-Pil02 Bp . This diagram was generated using PDBSum 
server (www.ebi.ac.uk/pdbsum/) [52]. C. Crystal packing of the phosphate-containing N-PN02 Bp structure, showing the three crystal packing dimers 
formed by alternative interactions between four symmetry-related monomers (green, blue, magenta and black). The three interfaces are highlighted 
by black, blue and red shading. The first 'dimer', is formed by the interaction between the green (or blue) and the magenta (or black) monomers and 
the light green (or light blue) phosphate. The second crystallographic dimer occurs between the magenta and black monomers. The third dimer is 
formed by the green and blue monomers. D. Stereo view of the electron density map for the residues building the phosphate ion binding pocket. 
The phosphate ion is shown as sphere; the electron density is contoured at 1.5 sigma level. E. Front and back view of N-Pil02 Bp electrostatic surface 
potential. The electrostatic potential was calculated using the CCP4MG viewer. Negative (red) and positive (blue) charges, and uncharged (white) 
surfaces are shown. F. Superposition of the 3D structures of N-Pil02 Bp (cyan; PDB codes 4BYZ and 4BZ0) and N-BfpC (chocolate; PDB code 3VHJ). 
doi:10.1371/journal.pone.0094981.g003 



proximity, thus promoting the interaction with phospholipids. In 
the crystal, the phosphate ion further interacts with Arg50 and 
Arg77 of a symmetry-related molecule. 

N-Pil02 Bp and the Tfpb bitopic protein N-BfpC share the 
same fold 

As N-Pil02B P had no evident structural homologs known, we 
used our crystallographic results to search the structural database. 
Using Dali [44], the closest structural homolog of N-P1102b p was 
identified as the N-terminal domain of BfpC (Dali Z-score: 16.8, 
PDB Id 3VHJ, root-mean-square difference (RMSD) of 2.9 A over 
159 matched Ca pairs; hereafter N-BfpC), an accessory protein of 
the E. coli Tfpb BFP variant. As highlighted by the low RMSD 



value, the two structures are very similar, but differ for the absence 
of the anchor-loop in N-BfpC (residues 168-179 in N-Pil02 Bp , 
Fig. 3F); nevertheless, the two proteins are described as part of two 
distinct assembly machineries. The next Dali hit corresponds to an 
uncharacterized protein from Bacteroides thetaiotaomicron (PDB 
3HRG) with an actin-like ATPase fold (Z-score = 7.4). 

Based on amino-acid sequence only, BfpC is not recognized as a 
member of a Pfam family, a result that would stress substantial 
evolutionary distance from Pil02B P - However, a PDB search with 
the recently introduced PDBfam tool [45] recognizes 3VHJ as the 
only PDB structure matching the PF06864 Pfam family. Three 
additional results support BfpC as a member of PF06864. First, 
alignment of the full BfpC sequence (B7UTD4) against our new 
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profiles resulted in 69% (Default HMM profile), 62% (EBI profile) 
and 64% (NCBI profile) of the residues aligned. Second, using the 
function READALIGN of the ProFit program (http://www. 
bioinf.org.uk/software/profit/index.html) a superposition of the 
3VHJ structure onto N-Pil02B p , based on the sequence alignment 
produced by our HMM profiles produced low RMSD values for 
the aligned part (3.6 A using the Default profile, 3.2 A using the 
EBI profile and 2.8 A using the NCBI profile). Thirdly, a search of 
Pil02B P against uniprot_trembl_bacteria with the jackhmmer 
program (HMMR3 package) using default parameters (five 
iterations) [46] (Pfam main protocol for generating families [19]) 
returned a multiple alignment where both proteins are present. 
Using this alignment to superpose both structures results in an 
RMSD of 3.0 A. The three pairwise alignments mentioned above, 
together with a structural alignment obtained with the program 
CE [47] are presented in Fig. SI. 

These findings strongly suggest that both BfpC and Pil02B p , 
despite their apparent lack of similarity at the sequence level (6.7% 
identity; 16.7% similarity) (Fig. S2), belong to the PF06864 family. 
Importantly, these are the first 3D structures assigned to this Pfam 
family (Fig. 3F). 

Discussion 

Tfpb are found in enteric pathogens such as V. cholerae, S. enterica 
and E. coli, and are important for bacterium-to-bacterium 
interactions and for pathogenesis [34]. B. pseudomallei is a Gram- 
negative bacterium endowed with high capability of adapting to a 
wide range of environments, where it can produce biofilm. Both 
characteristics presumably became possible through the acquisi- 
tion of genetic information from other organisms. B. pseudomallei 
K96243 hosts eight Tfp machineries, among which Tfp7 is a Tfpb 
assembly machinery [48] . Such widespread occurrence may imply 
that Tfp7 is not responsible for the virulent nature of B. 
pseudomallei; however, it may be part of the biofilm formation 
machinery, as described for the Tfpb R64 thin pilus in S. enterica. 

We here-report work carried out on Pil02B P , a component of 
the Tfpb R64 thin pilus variant assembly machinery from B. 
pseudomallei. Despite the fact that Pil02B P , is a representative 
member of the PF06864 Pfam family, the Pfam algorithm failed to 
align the first 92 residues of this protein with the other PF06864 
members. Pfam is a widely used database of protein families, 
currently containing more than 13000 manually curated families. 
Two types of families are distinguished: high quality, manually 
curated Pfam-A families, and automatically generated Pfam-B 
families. Some Pfam-A familes are seeded by structures deposited 
in the Protein Data Bank, and the determination of new structures 
for known families has already led to their extension in the past 
[19,49]. 

Extensive sequence analysis within the 15% representative 
PF06864 members indicated that the definition of PF06864 is 
incomplete. Indeed, the alignment of the N-terminal part of these 
proteins could be substantially improved using new HMM profiles 
for the family, as demonstrated here. In addition, screening the 
uniprot_trembl_bacteria database with our newly developed 
profiles allowed us to identify 43 new PF06864 family members 
that are likely to be Pil02 proteins. 

Prior to the results reported here, the PF06864 family did not 
have a representative 3D structure. Our SAD crystallographic 
approach, based on anomalous scattering from sulfur atoms, 
yielded a high resolution N-P1102b p 3D structure, thus shedding 
first light on the key structural features of this protein family. The 
N-Pi102b p 3D structure, according to SCOP, hosts a ribonuclease 
H-like fold, typical of proteins belonging to the actin-like ATPase 



domain superfamily, consisting of a globular moiety composed of 
two similar ot/p sub-domains separated by a cleft. 

One of the N-Pil02B p crystal packing interfaces hosts a 
phosphate ion, housed in a pocket that could be functionally 
relevant in vivo, mediating the binding to phospholipids of the IM. 
On the other hand, the stabilization of the (otherwise flexible) 
'anchor loop' built by residues 169-179 is obtained thanks to 
intermolecular interactions that occur in a region characterized by 
positively charged residues, suggesting its potential role in the 
assembly with other (macro)molecular partners. 

N-BfpC is the closest known structural homolog of N-Pil02B P 
(RMSD of 2.9 A). Although BfpC is a Tfpb system component, its 
assembly machinery pertains to a variant different from Tfp7. 
However, the two operons share four orthologous genes, with bfpC 
and pil02sp being non-orthologous (sequence identity of 6.7%; 
sequence similarity of 16.7%). In fact, BfpC had not been assigned 
to the PF06864 family. Its belonging to this family becomes 
however clear when the HMM profiles described here are applied. 
In conclusion, Pu02b p and BfpC are likely homologous proteins 
sharing negligible sequence identity but high 3D structural identity 
(in their N-terminal 194-residue segment), despite the absence of 
the N-Pil02 Bp 'anchor-loop' in N-BfpC. Although part of two 
different machineries, Pi102b p and BfpC may share similar 
functions. Such a proposal would be in keeping with the 
observation that both are accessory proteins in Tfpb assembly 
machineries, that they comprise two domains linked by a TM 
helix, and that their N-terminal domains share the same overall 
fold. Thus, we could speculate that the N-Pu02b p domain falls in 
the cytoplasmic compartment, where it might associate with the 
cytoplasmic domain of the PilQ2 protein (a BfpD homolog) and 
with the N-terminal domain of the PilR2 protein (a BfpE 
homolog), in line with the reported association of N-BfpC with 
BfpD and BfpE [50]. Notably, N-BfpC was identified as a 
structural homolog of N-EpsL from the Type II secretion system 
[51]. N-EpsL interacts with the N-terminal part of the EpsE 
ATPase. The PilT2 Bp ATPase from Tfp7 shares 19% homology 
with EpsE, but lacks the first 1 10 residues that are responsible for 
this association. Such structural features may imply that PilT2B P 
and Pil02B P do not associate, or that their mutual recognition is 
based on different principles. 

In conclusion, coupling thorough sequence analyses, database 
mining, and a new S-SAD phased crystal structure, led to two 
innovative discoveries within the PF06864 Pfam family. On one 
hand, the establishment of new HMM profiles enabled a full 
sequence alignment of Pu02b p to other members of the family and 
prompted the identification of 43 new members. On the other 
hand, crystallographic analysis of N-P1102b p provided the first 3D 
structure of a PF06864 family member, contributing to the 
characterization of the Tfpb assembly machinery in the R64 thin 
pilus variant. 

Supporting Information 

Figure SI Comparison of pairwise sequence alignments 
of N-Pil02 Bp . (Uniprot Id Q63JW5, structure presented in this 
work) and BfpC (UniProt Id B7UTD4, PDB Id 3VHJ) obtained 
with different approaches. 1st pair: from comparing Q63JW5 to all 
sequences in the uniprot_trembl_bacteria database using jackhm- 
mer (HMMR3 package). 2nd pair: Structure superposition using 
CE. 3rd pair: hmmalign (HMMR3 package) against the Default 
HMM profile. 4th pair: hmmalign against the EBI HMM profile. 
5th pair: hmmalign against the NCBI HMM profile. 
(DOCX) 
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Figure S2 Sequence and secondary structural alignment be- 
tween Pil02 Bp and BfpC. 
(DOCX) 

Table SI New protein assignments to the PF06864 
family using to the newly created Default, EBI and 
NCBI HMM profiles. 

(DOCX) 
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